Regularized Interlingual Projections: Evaluation on Multilingual Transliteration

نویسندگان

  • Jagadeesh Jagarlamudi
  • Hal Daumé
چکیده

In this paper, we address the problem of building a multilingual transliteration system using an interlingual representation. Our approach uses international phonetic alphabet (IPA) to learn the interlingual representation and thus allows us to use any word and its IPA representation as a training example. Thus, our approach requires only monolingual resources: a phoneme dictionary that lists words and their IPA representations.1 By adding a phoneme dictionary of a new language, we can readily build a transliteration system into any of the existing previous languages, without the expense of all-pairs data or computation. We also propose a regularization framework for learning the interlingual representation, which accounts for language specific phonemic variability, and thus it can find better mappings between languages. Experimental results on the name transliteration task in five diverse languages show a maximum improvement of 29% accuracy and an average improvement of 17% accuracy compared to a state-of-the-art baseline system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dissertation title : Discriminative Interlingual Representations

Dissertation title : Discriminative Interlingual Representations Jagadeesh Jagarlamudi, Doctor of Philosophy, 2013 Dissertation advised by: Hal Daumé III Department of Computer Science The language barrier in many multilingual natural language processing (NLP) tasks can be overcome bymapping objects from different languages (“views”) into a common low-dimensional subspace. For example, the name...

متن کامل

An Interlingual Lexical Organisation Based on Acceptions From the PARAX mock-up to the NADIA system

Many projects are conducted to develop multilingual lexical databases. Some of these projects use an interlingual approach (KBMT-89, EDR, ...), where others choose a bilingual approach (Multilex, ...). This paper presents an interlingual approach based on acceptions (word-senses) aiming at the development of a multilingual lexical database management system: NADIA. With this approach, the inter...

متن کامل

Interlingual Lexical Organisation For Multilingual Lexical Databases In NADIA

We propose a lexical organisation for multilingual lexical databases (MLDB). This organisation is based on acceptions (word-senses). We detail this lexical organisation and show a mock-up built to experiment with it. We also present our current work in defining and prototyping a specialised system for the management of acception-based MLDB.

متن کامل

Quillpad Multilingual Predictive Transliteration System

Transliteration has been one of the common methods for multilingual text input. Many earlier methods employed transliteration schemes for defining one to one mapping of input character combinations to output character combinations. Though such well defined mappings made it easier to write a transliteration program, the end user was burdened with learning the mappings. Further, though transliter...

متن کامل

Interlingual Aspects Of Wikipedia's Quality

This paper presents interim results of an ongoing project on quality issues concerning Wikipedia. One focus of research is the relation of language and quality measurement. The other one is the use of interlingual relations for quality assessment and improvement. The study is based on monoand multilingual samples of featured and non-featured Wikipedia articles in English, French, German, and It...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012